This is the combined dataset from
/library/s4_lectures/3_network_science/03_02_mentorship_network_paths.ipynb
(in notebook its final name is connect_names).
Codebook:
CID unique identifier for each connection
MenteeID unique identifier of the trainee.
MentorID unique identifier of the mentor.
MentorshipType integer coding the type of relationship:
(What does “4” mean?)
0=undergrad research assistant,
1=graduate student,
2=postdoctoral fellow,
3=research scientist.
Institution string name of institution where training
took place (it’s raw, is there a solution on zenodo.org?)
StopYear year of graduation/training completed (what is
-1?)
gender_t, gender_m gender by first name
(using the dataset available at zenodo.org)
ResearchArea_t, ResearchArea_m- first
research area (from the full list of each person’s areas, only the first
is taken)
## Rows: 742,766
## Columns: 10
## $ CID <dbl> 2, 3, 5, 17, 18, 19, 25, 36, 44, 58, 106, 105, 111, 1, …
## $ MenteeID <dbl> 2, 4, 6, 27, 28, 8, 5, 17, 7, 60, 33, 105, 108, 1, 521,…
## $ MentorID <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ MentorshipType <chr> "1=graduate student", "2=postdoctoral fellow", "1=gradu…
## $ Institution <chr> "University of California, Berkeley", "University of Ca…
## $ StopYear <dbl> 2005, 2006, 2008, -1, -1, 2006, 2009, 2002, 2004, 2007,…
## $ ResearchArea_t <chr> "neuro", "neuro", "neuro", "neuro", "neuro", "neuro", "…
## $ ResearchArea_m <chr> "neuro", "neuro", "neuro", "neuro", "neuro", "neuro", "…
## $ gender_t <chr> "man", "man", "man", "man", "woman", "woman", "unknown"…
## $ gender_m <chr> "man", "man", "man", "man", "man", "man", "man", "man",…
| Overall (N=742766) |
|
|---|---|
| as.character(MentorshipType) | |
| 0=undergrad research assistant | 18838 (2.5%) |
| 1=graduate student | 630099 (84.8%) |
| 2=postdoctoral fellow | 68618 (9.2%) |
| 3=research scientist | 7402 (1.0%) |
| 4=? | 17809 (2.4%) |
Family 1. Single parent with child
Family 2. Single parent: large family with twins
Family 3. Two parents with a child
Family 4. Mixed, with stepparents/stepbrothers/stepsisters
Family 5. …
Conditions trainee entering the “Family”:
through the “parent”(mentor) - necessary condition
through the overlapping study period with other trainees, OR/AND trainees should have the same area (?) OR/AND trainees’ co-publications during the study with one mentor and n years after StopYear (?)
Conditions then mentor entering the “Family”(?) - need to look at reverse logic?
We have several cases with high numbers of mentors (max 7 mentors).
About 98.2% of trainees have one mentor and 1.6% of trainees have two mentors.
MentorshipType
only) had one trainee in particular StopYear| Overall (N=375545) |
|
|---|---|
| as.factor(Mentee_count_type) | |
| 1 | 294452 (78.4%) |
| 2 | 57802 (15.4%) |
| 3 | 15142 (4.0%) |
| 4 | 4669 (1.2%) |
| 5 and more | 3480 (0.9%) |
StopYear is the
main case
#### Figure 6: And again a story about the structure of the data by
year